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Abstract 


The  ability  to  predict  the  3-dimensional  structures  of  biopolymers  is  im¬ 
portant  to  AFOSR  research.  Current  research  applications  include  the  de¬ 
sign  of  biopolymer  support  matrices  for  non-linear  optical  materials  for  laser- 
resistant  optical  systems.  The  special  characteristics  of  the  biopolymer  struc¬ 
ture  determination  problem  (where  covalent  bonding  patterns  are  fixed)  dis¬ 
tinguish  it  from  chemical  structure  determination  problems  (where  bond  pat¬ 
terns  change)  and  special  methods  for  very  high  dimensionality  minimization 
are  needed.  Fortunately,  most  important  problems  require  simpler  perturba¬ 
tive  global  minimization  and  only  need  to  be  able  to  predicting  changes  in 
3-dimensional  conformations  from  a  known  initial  conformation.  This  prob¬ 
lem  will  be  much  easier  to  solve  than  the  full  “protein  folding  problem” ,  but 
is  still  complex  because  of  the  large  number  of  dimensions  (~  104)  involved. 

We  have  developed  the  fundamental  theory  and  algorithms  of  the  new 
Packet  Annealing  Method  and  tested  it  on  small  systems.  We  showed  that 
the  method  provides  a  natural  and  powerful  computational  approximation  to 
the  stochastic  description  of  biopolymer  motions  and  encompasses  other  com¬ 
peting  “potential  smoothing”  methods  as  special  cases.  Its  main  strength  is 
that  it  uncovers  and  exploits  the  intrinsic  “hidden  structures”  of  biopolymer 
energy  landscapes  to  efficiently  perform  global  minimization  using  a  hier¬ 
archical  search  procedure  which  concentrates  parallel  computing  effort  on  a 
sequence  of  selected  regions  of  decreasing  size.  Each  search  region  corresponds 
to  a  metastable  macrostate  of  the  system,  a  region  of  conformation  space  that 
is  isolated  from  the  remainder  of  the  space  by  effective  energy  barriers.  The 
effective  energy  includes  both  energetic  contributions  from  the  energy  poten¬ 
tial  function  and  entropic  contributions  resulting  from  thermal  fluctuations  of 
the  biopolymer.  It  determines  the  thermodynamic  macrostate  free-energies 
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which  (rather  than  the  mean  energies)  determine  biopolymer  structures. 

In  addition,  new  methods  for  computing  macromolecular  conformational 
transitions  and  for  molecular  dynamics  simulation  were  developed. 
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I.  INTRODUCTION 


A.  Motivation:  AFOSR  Need  for  Protein  Structure  Prediction 

The  ability  to  predict  the  changes  in  the  3-dimensional  (3-D)  structures  of  biopolymers 
that  are  induced  by  changes  in  their  covalent  structures  is  important  to  AFOSR  research.  For 
example,  research  in  the  Laser  Hardened  Optical  Materials  Branch  of  the  Electromagnetic 
Materials  Division  at  Wright  Laboratory,  Wright  Patterson  AFB  is  aimed  at  developing 
synthetic  biopolymers  which,  because  of  their  well-defined  3-D  structures,  can  provide  a 
superior  support  matrix  for  orienting  light-absorbing  chromophores  in  non-linear  optical 
materials  for  laser-resistant  optical  systems.  Dr.  Ruth  Pachter,  working  on  laser  resistance 
technology  at  Wright  Laboratory,  concludes 

Peptide  structure  predictions  and  molecular  dynamics  simulations  of  these  pep¬ 
tides  are  key  in  the  interpretation  of  the  results. 

and 

...the  importance  of  novel  developments  for  studying  large  molecular  systems..., 
especially  protein  folding  and  design,  illustrate  the  importance  for  advances  in 
new  optimization  techniques  for  determining  the  global  energy  minimum  of  these 
compounds.  Such  a  task  will  enable  rapid  advances  in  designing  new  laser  resis¬ 
tant  materials,  (from  R.  Pachter,  “Nano  Architectures  for  Agile  Optical  Thresh¬ 
olds”) 

Another  typical  application  includes  the  development  of  new  materials  with  exceptional 
properties  modelled  on  naturally-ocurring  proteins  (e.g.,  exceptionally  strong  synthetic  fibers 
based  on  the  structure  of  spider  silk).  The  computational  needs  of  this  research  are  essen¬ 
tially  identical  to  those  encountered  in  many  other  aspects  of  biotechnology — for  example, 
in  the  study  of  the  interactions  between  viral  proteins  and  drugs  designed  to  interact  and  in¬ 
terfere  with  them.  Further,  because  similar  chemical  principles  are  involved,  these  methods 
should  be  applicable  to  a  wide  variety  of  other  polymeric  structure  problems  as  well. 
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Prediction  of  the  folding  of  biopolymers  into  their  stable  3-D  structures  (the  “protein 
folding  problem” )  is  difficult  because  of  the  large  numbers  of  atomic  coordinates  (and  hence, 
mathematical  degrees  of  freedom)  to  be  determined.  Typical  problems  involve  103  -  104 
degrees  of  freedom.  Fortunately,  in  practice  it  is  not  necessary  to  predict  structures  de  novo. 
New  biopolymers  are  experimentally  developed  by  iterations  of  a  design  cycle  in  which  the 
covalent  structure  of  a  biopolymer  having  a  3-D  structure  with  fairly  good  characteristics 
is  modified  to  a  new  covalent  structure  which,  according  to  computational  predictions,  will 
have  a  modified  3-D  structure  with  even  better  properties.  Typical  alterations  include  amino 
acid  substitution  or  the  addition,  by  covalent  or  non-covalent  linkage,  of  a  small  chemical 
group  (e.g.,  a  chromophore)  to  a  biopolymer  of  known  structure.  Even  when  determining  the 
structure  of  a  new  biopolymer,  good  approximate  starting  points  can  often  be  selected  from 
extensive  databases  of  known  3-D  structures  (which  have  been  experimentally  determined 
by  X-ray  crystallography  or  nuclear  magnetic  resonance).  In  all  these  cases  it  is  the  simpler 
perturbative  structure  prediction  problem  that  is  of  primary  interest. 

B.  Global  optimization,  free-energies,  and  biopolymer  structure  prediction 

The  structure  of  a  biopolymer  is  governed  by  its  potential  energy  function  V(R),  a  com¬ 
plicated  function  of  all  the  atomic  coordinates  R  =  {fj;  i  =  1 . . .  N},  where  N  is  the  number 
of  atoms.  In  principle,  it  must  be  derived  from  quantum  mechanics,  but  since  biopolymer 
3-D  structures  are  governed  by  non-covalent  bonding  (covalent  bonding  is  invariant  in  most 
problems),  classical  approximations  appear  to  be  adequate.  However,  the  very  high  dimen¬ 
sionality  of  the  problem  [N  is  typically  ~  O(104)]  makes  the  structure  prediction  problem 
particularly  difficult  and,  with  current  algorithms,  many  orders-of-magnitude  beyond  the 
capabilities  of  even  the  largest  parallel  computers.  New  algorithms  are  needed. 

Simulated  annealing,1  in  which  the  biopolymer  thermal  fluctuations  are  simulated  at 
a  sequence  of  decreasing  temperatures,  is  one  of  the  most  powerful  approaches  available 
at  the  present  time.  Although  simulated  annealing  is  inadequate  for  biopolymer  structure 
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prediction,  it  has  demonstrated  the  value  of  annealing  approaches  in  general.  However,  most 
current  methods  attempt  to  predict  structure  by  searching  for  the  global  minimum  of  V, 
arguing  that  this  corresponds  to  the  most  energetically  stable  conformation.  Such  purely 
energetic  approaches  ignore  the  entropic  effects  that  result  from  conformational  fluctuations 
and  will  only  be  accurate  at  very  low  temperatures  near  absolute  zero,  not  at  the  working 
temperatures  of  practical  importance.  At  working  temperatures,  biopolymers  thermally 
fluctuate  through  many  conformations  according  to  the  Gibbs-Boltzmann  probability  density 
Pb : 

pB(0-,R)<xe-ev^/Z^) 

13  =  (k„T)-' 

where  ks  is  Boltzmann’s  constant,  T  is  the  temperature,  /?  is  the  “inverse  temperature”, 
and  Z((3)  is  a  normalizing  constant  (the  “partition  function,”  see  Ref.  2  for  review).  During 
the  course  of  these  rapid  fluctuations,  the  protein  rapidly  traverses  hundreds  or  thousands 
of  local  minima3  of  V  within  an  extended  region  of  conformation  space  that  we  call  a 
macrostate.  Over  longer  time  periods,  particularly  at  higher  temperatures,  the  biopolymer 
will  occasionally  make  transitions  to  other  extended  macrostate  regions.  The  probability  of 
being  in  each  macrostate  is  given  by  its  free-energy,  which  is  the  integral  of  ps  (/?;  R)  over 
the  macrostate  region2  and  which  depends  both  on  V  within  the  macrostate  and  on  the 
size  of  the  macrostate.  Thus,  it  is  the  free-energy  that  must  be  globally  minimized  during 
annealing  to  predict  biopolymer  structure. 

Each  macrostate  is  separated  from  the  others  by  energy  barriers  that  must  be  large 
compared  to  the  thermal  energy  kgT,  so  conformational  fluctuations  within  a  macrostate 
are  rapid  while  conformational  fluctuations  between  macrostates  are  slow.  Furthermore,  the 
size  and  free-energy  of  each  macrostate  depends  on  the  temperature.  Even  the  number  of 
macrostates  varies  with  temperature:  small  “child”  macrostates  that  exist  at  low  tempera¬ 
tures  will  merge  into  unified  “parent”  macrostates  at  higher  temperatures  when  the  energy 
barriers  between  them  become  small  compared  to  fc^T.  In  principle,  the  properties  of  the 
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individual  macrostates  (e.g.,  mean  conformation,  enthalpy,  entropy,  etc.)  and  the  connec¬ 
tions  between  them  can  be  computed  and  traced  in  macrostate  trajectory  diagrams.  These 
relationships  and  diagrams  can  provide  a  hierarchical  description  of  conformation  space  that 
reflects  the  underlying  kinetic  properties  of  the  biopolymer.  The  macrostates  constitute  a 
“hidden  structure”4  that  strongly  influences  algorithmic  performance  and  can  be  used  to 
advantage  once  uncovered. 


C.  A  physical  analogy 

To  illustrate,  consider  the  2-dimensional  problem  of  finding  the  lowest  region  on  the 
surface  of  the  earth.  Simulated  annealing  corresponds  to  tracking  the  position  of  a  very 
small  test-object  as  it  migrates  while  the  earth  is  shaken  with  progressively  lower  intensities 
(temperatures).  The  process  is  inefficient  because  after  each  jump  the  test-object  samples 
the  height  (energy)  over  a  region  that  is  too  small  compared  with  the  sizes  of  the  jumps. 
Because  of  this,  the  test-object  is  too  sensitive  to  small-scale  local  fluctuations  in  the  fractal¬ 
like  energy  surface  that  tend  to  mask  the  more  important  large-scale  global  structures  of 
the  surface.  A  more  efficient  procedure  would  be  to  start  with  large  “soft”  test-objects 
with  diameters  matched  to  the  sizes  of  the  oceans  (e.g.,  a  10,000  km  beach-ball)  and  to 
iteratively  minimize  their  positions  as  temperature  was  progressively  reduced.  The  sizes  of 
the  beach-balls  should  be  matched  to  the  landscape  in  a  self-consistent  manner  so  that  the 
balls  are  roughly  of  the  same  size  as  the  temperature-dependent  confining  regions  that  they 
are  searching,  i.e.,  the  macrostates  of  the  system.  The  search  trajectory  of  each  ball  will 
be  governed  by  the  effective  energy ,  an  integral  of  the  height  over  a  region  self-consistently 
chosen  to  match  the  size  of  the  ball.  At  high  temperature  the  macrostates  are  the  oceans.  As 
temperature  (shaking)  is  decreased,  the  ridges  separating  smaller  depressions  in  the  bottom 
of  the  oceans  become  important  and  the  oceanic  macrostates  divide  ( branch )  into  smaller 
child  macrostates.  For  an  efficient  parallel  search,  a  separate  ball  should  initially  be  used 
to  search  each  ocean  and,  as  temperature  is  reduced,  each  ball  should  be  replaced  with  an 
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appropriate  set  of  smaller  balls  to  search  each  child  macrostate  region.  This  process  continues 
recursively  as  children  have  children  and  the  macrostates  get  smaller  and  smaller.  Since  we 
expect  their  number  to  increase  rapidly  with  decreasing  temperature,  all  macrostates  can 
not  be  searched  and  it  is  necessary  to  select  only  the  most  promising  for  investigation. 
Success  requires  that  the  number  of  macrostates  does  not  grow  rapidly  in  comparison  with 
our  ability  to  discard  unprofitable  search  trajectories.  We  call  this  approach  the  Packet 
Annealing  Method  (see  Church  et  al.,  1996). 

The  Packet  Annealing  Method  is  particularly  suited  to  the  fractal-like  structure  of  the 
surface  of  the  Earth:  because  it  has  been  formed  by  the  action  of  a  large  number  of  quasi¬ 
independent  forces,  the  surface  contains  structure  at  multiple  spatial  scales.  Biopolymer 
energy  functions  probably  have  similar  properties  since  they  are  sums  of  very  large  numbers 
of  quasi-independent  interactions  dominated  by  two- body  terms.  Note  that  this  algorithm 
will  not  find  anomalous  minima-deep  but  very  narrow  holes  (e.g.,  oil  wells).  It  is  the  fact 
that  the  algorithm  is  explicitly  designed  to  ignore  such  anomalous  minima  that  makes  it 
highly  efficient.  This  is  not  a  disadvantage  since  anomalous  minima  of  the  energy  function 
are  not  usually  minima  of  the  free-energy  at  practical  temperatures  and,  in  any  case,  would 
not  be  expected  to  be  found  by  the  physical  system  itself. 


D.  Packet  Annealing  Method 


During  the  project  we  developed  most  of  the  formalism  and  algorithms  needed  to  imple¬ 
ment  the  Packet  Annealing  Method.  The  central  tool  is  the  effective  energy: 


HxViR) 


-2/r1  log 


deri  J <*'-*>*(*'-*)<*# 


(1) 


which  depends  on  the  integral  of  pjg(/3;  R)  over  a  Gaussian- weighted  region  parameterized 
by  a  fluctuation  tensor  A  =  (2 (3K)~?.  As  A  — *  0,  Hk  reduces  to  V.  For  non-zero  A,  Hk  is 
a  smoothed  transform  of  V  that  suppresses  all  fluctuations  on  size-scales  <  A.  Because  it  is 
smoothed,  Hk  can  be  minimized  much  more  rapidly  than  V. 
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We  have  shown  (Oresic  and  Shalloway,  1994;  Shalloway,1996)  that  HKo(/3;  provides 
a  good  approximation  to  the  free-energy  of  macrostate  o;  when  is  set  to  the  centroid  of 
the  macrostate  and  K®  is  properly  matched  to  the  size  of  the  macrostate.  Each  macrostate 
probability  distribution  can  be  approximated  by  a  Gaussian  characteristic  packet  <fPa\ 

^0(£.  _  e- §{v2(0)+{R-Rim  Kl{0)  [R-Rlm)  jZ*(p)  (2) 

which  is  described  by  the  characteristic  parameters  R^,  K®,  and  an  amplitude-fixing  scalar 
Vj.  These  parameters  are  determined  by  solving  the 

Characteristic  Packet  Equations 


and 


Integral  form 

'  W€) 


(K)<  =  M 


{pl\4>l) 


(ptlCg.  -  fiPijR  -  KhM 
(plltl) 


Differential  form 


dl i? 

df?t  dK‘ 


=  0 

(3a) 

R0=R° 

=  (Kh 

(3b) 

ft? 

II 

o 

fti 

Kii 8)  =  HKl(/3-,  <)  -  2/3-1  log  det-i  .  (4) 

Eqs.  (3)  are  a  non-linear  coupled  set  of  equations  that,  in  effect,  perform  pattern- 
recognition  to  identify  the  dominant  structures  of  the  energy  landscape  at  each  temperature. 
Each  solution  corresponds  to  a  macrostate.  For  example,  Fig.  1  displays  ps  at  a  sequence 
of  decreasing  temperatures  for  a  model  two-dimensional  potential  V  (panel  a).  The  sup¬ 
port  of  Pb  is  dispersed  at  high  temperature  and  converges  at  low  temperature  to  a  small 
region  centered  about  the  global  minimizer  of  V.  While  Pb  is  complicated  at  intermediate 
temperatures  (panel  b),  the  behavior  of  the  macrostates  at  different  temperatures  can  be 
simply  modeled  by  following  the  appearance,  movement  and  disappearance  of  regions  of 
concentrated  probability  density  with  temperature.  At  every  temperature,  each  region  a 
is  characterized  by  the  characteristic  parameters  (panel  d)  and  corresponding  characteristic 
packets  (panel  c).  As  T  decreases,  the  characteristic  packets  and  corresponding  macrostates 
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FIG.  1.  Annealing  using  metastables  states,  (a)  A  model  two-dimensional  potential  V(ri,r2). 
(b)  The  corresponding  Gibbs/Boltzmann  probability  distribution  pB  at  three  temperatures, 
Thi  >  Tme d  >  Tio-  (c)  Superposition  of  the  (squared)  characteristic  packets  (<£°  )2  that  are  solutions 
to  the  packet  equations  at  the  three  temperatures.  (A  large  number  of  characteristic  packets, 
corresponding  to  the  very  small-scale  fluctuations  of  V,  will  appear  at  lower  temperatures.)  (d) 
The  characteristic  packets  are  parametrized  by  the  positions  of  their  center-of-masses  (il°)  and 
by  their  root-mean-square  fluctuations  tensors  (A0),  represented  here  by  ellipses,  (e)  Free-energy 
vs  temperature  trajectory  diagram  for  this  temperature  range.  Solid  lines  represent  metastable 
state  trajectories  and  dotted  lines  represent  transitions.  The  discontinuities  in  the  trajectories 
correspond  to  branch  points  at  which  packets  bifurcate.  Solid  arrowheads  indicate  “escape”  or 
preferred  “capture”  transitions.  Open  arrowheads  indicate  unpreferred  transitions  that  can  be 
detected  by  the  missing-mass  procedure.  (From  Church  et  al.,  1996). 
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continuously  decrease  in  size  and  divide  into  children.  This  process  is  reported  by  the  tra¬ 
jectory  diagram  (panel  e)  which  provides  hierarchical  description  of  the  energy  landscape 
which  displays  its  intrinsic  structure  in  a  manner  that  is  not  obvious  from  inspection  of 
V  itself.  At  each  temperature  the  most  stable  macrostate  of  the  system  is  the  one  having 
lowest  free-energy  (i.e.,  macrostate  a,  /?  or  S,  depending  on  T). 

E.  Trajectory  diagrams  and  scaling  properties 

It  is  frequently  speculated  that  protein  energy  landscapes  have  an  overall  structure  that 
naturally  “funnels”  the  macromolecules  towards  their  native  folded  state.5  This  could  explain 
the  fact  that  natural  proteins  fold  very  rapidly  compared  to  the  rate  that  would  be  expected 
if  they  performed  a  random  search  for  the  native  state.6  However,  it  has  not  previously  been 
possible  to  computationally  determine  whether  this  was  true  except  for  highly  idealized 
model  systems.  The  trajectory  diagrams  enable  us,  for  the  first  time,  to  do  this. 

Consider  the  two  potentials  V(R)  shown  in  Fig.  2.  The  one  on  the  left  will  funnel  systems 
to  the  lowest  energy  state;  the  one  on  the  right  will  not.  This  is  reflected  in  the  variation 
of  Hk  with  A  (plotted  above):  a  sequence  of  local  minimizations  of  Hk  as  A  is  reduced 
converges  to  the  global  minimum  in  the  funneling  case  but  not  in  the  non-funneling  case. 
We  call  the  former  case  strong  scaling  and  the  latter  case  weak  scaling  to  emphasize  the  role 
of  the  scaling  parameter  A. 

While  the  scaling  properties  of  these  simple  one-dimensional  cases  can  be  determined  by 
graphical  examination  of  V  and  Hk,  this  is  not  possible  in  high  dimensionality  problems. 
However,  the  scaling  properties  can  be  determined  from  the  trajectory  diagrams  in  any 
number  of  dimensions.  The  critical  point  is  that  in  the  strong  scaling  case  the  trajectory  that 
leads  to  the  low-temperature  global  minimum  is  the  lowest  trajectory  at  all  temperatures. 
Thus  it  is  very  easy  to  trace.  In  contrast,  there  is  trajectory  crossing  in  the  weak  scaling 
case  and  it  is  not  sufficient  to  trace  just  the  lowest-lying  trajectory  at  each  temperature. 
The  weaker  the  scaling,  the  more  low-lying  trajectories  must  be  traced  to  be  certain  that 
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FIG.  2.  Strong  and  weak  scaling  properties.  Each  upper  panel  shows  a  sequence  of  effec¬ 
tive-energy  functions  obtained  by  convolution  with  Gaussians  having  widths  Ay  >  Amed  >  Ai0 
according  to  Eq.  (1).  (The  paths  of  searches  using  local  minimization  of  the  effective-energies  is 
shown.)  The  lower  panels  roughly  indicate  the  free-energy  vs  temperature  trajectory  diagrams 
that  correspond  to  these  potentials,  (a)  A  “strong  scaling”  case  where  the  search  is  “funneled”  to 
the  global  Tninimiim-  (b)  A  “weak”  scaling  case  where  a  single  sequence  of  downhill  searches  does 
not  find  the  global  Tninimiim.  (From  Shalloway,  1997.) 
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the  free-energy  global  minimum  will  be  found  at  the  working  temperature.  Clearly,  it  will 
be  important  to  determine  the  scaling  properties  of  the  energy  landscapes  of  biopolymers  of 
interest. 


F.  Overview 

The  fact  that  biopolymers  fold  on  time-scales  much  shorter  than  those  needed  for  a 
random  search  of  conformation  space  suggests  that  they  utilize  specific,  kinetically-favorable 
folding  pathways  to  accelerate  the  process.  Each  pathway  corresponds  to  a  specific  path 
down  a  macrostate  trajectory  diagram.  The  Packet  Annealing  Method  is  designed  to  mimic 
this  efficient  behavior  by  identifying  and  following  the  high-probability  macrostates.  The 
changes  in  macrostate  position,  size  and  number  are  traced  using  the  characteristic  packet 
equations.  This  is  efficient  because  the  effective-energy  function  is  usually  smooth  within  a 
single  macrostate  region.  The  approach  has  a  number  of  unique  features  including: 

1.  Physically  appropriate:  it  traces  the  free-energies  of  macrostates,  not  the  energies  of 
individual  conformations.  Thus,  it  accounts  for  not  only  the  mean  conformation,  but 
also  of  the  conformational  fluctuations. 

2.  Potential  smoothing  by  the  effective-energy:  the  small  fluctuations  in  the  energy  land¬ 
scape  are  removed  by  the  spatial  averaging  of  Eq.  (1).  Therefore,  minimization  using 
Hk  proceeds  much  more  rapidly  than  minimization  using  V. 

3.  Macrostate  trajectory  diagrams:  this  novel  description  of  the  energy  landscape  uncov¬ 
ers  its  qualitative  “hidden”  structure  and  provides  a  “road-map”  for  organizing  an 
efficient  parallel  search  to  the  stable  structure. 
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II.  PROGRESS 


A.  Packet  Annealing  with  Anisotropic  Averaging  Tensors 

Our  studies  before  the  project  had  only  used  isotropic  averaging  tensors  A°  in  which 
all  fluctuations  were  assumed  to  be  equal.  However,  the  order-of-magnitude  differences 
between  the  actual  magnitudes  of  the  fluctuations  of  different  atom-pair  distances  in  a 
macromolecule  must  be  matched  with  anisotropic  fluctuation  tensors.  We  tested  anisotropic 
averaging  using  the  6,  7  and  8  atom  argon  microclusters  as  test  cases.  Computational 
methods  for  iteratively  solving  the  characteristic  packet  equations  (3)  and  identifying  the 
appearance  of  children  at  branch  points  (i.e.,  subsearch  branching)  were  developed.  This 
enabled  us  to  compute  macrostate  trajectory  diagrams  for  these  systems,  the  first  time  that 
this  has  been  done  for  non-trivial  problems.  For  example,  the  7-atom  microcluster  has  4 
conformational  isoforms  (Fig.  3a)  corresponding  to  local  minima  of  the  potential.  While  the 
15  (=3x7—  6)-dimensional  potential  V ( R )  can  not  be  visualized,  the  free-energy  and  mean- 
energy  trajectory  diagrams  (Fig.  3b)  reveal  the  hierarchical  organization  of  the  macrostates 
and  associated  isoforms.  The  critical  temperatures  (at  the  positions  of  the  dotted  lines)  give 
the  magnitudes  of  the  effective  energy  barriers  between  these  states.  These  studies,  which 
demonstrated  our  ability  to  compute  macrostates  and  trajectory  diagrams  for  non-covalently 
linked  systems,  are  discussed  in  more  detail  in  Oresic  and  Shalloway  (1994). 

B.  Packet  Annealing  in  covalent  systems 

The  effective-energy  method  was  originally  developed  in  Cartesian  coordinates,  but 
this  is  inefficient  for  biopolymer  calculations  because  of  the  highly  non-linear  constraints 
imposed  by  the  interatomic  covalent  bonds.  Higher  efficiency  can  be  obtained  using  “internal 
coordinates”,  a  combination  of  bond-length,  bond-angle  and  torsion-angle  variables.  A 
major  goal  (and  accomplishment)  of  the  project  was  to  implement  the  method  for  covalently 
bonded  biopolymers  using  internal  coordinates. 
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FIG.  3.  Trajectory  diagrams  for  7-atom  argon  microclusters,  (a)  Isomers  at  T  =  0.  (b) 
Mean-energy  (E)  and  free-energy  (F)  trajectory  diagrams.  The  connections  between  trajectories 
(dotted  lines  and  arrows)  indicate  transition  temperatures  where  child  macrostates  merge  with  the 
parental  states.  The  hierarchical  organization  of  the  landscape  is  apparent:  isomers  <2  and  t$  are 
children  of  o\  which  itself  is  a  child  of  t\.  (From  Oresic  and  Shalloway,  1994.) 
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1.  Multivariate  wrapped- Gaussian  distribution  for  biopolymers 


We  were  surprised  to  find  that  a  basic  component  of  required  mathematical  theory,  an 
accurate  analog  of  the  Cartesian  coordinate  Gaussian  distribution  for  internal  coordinates, 
had  not  yet  been  developed.  (Most  workers  used  the  quasi-harmonic  approximation  even 
though  this  was  very  inaccurate  for  this  application.)  Thus,  our  first  step  was  to  develop 
the  appropriate  analog,  the  “multivariate  wrapped-Gaussian  distribution” .  A  comparision 
of  the  performance  of  this  method  with  the  older  quasi-harmonic  method  is  shown  in  Fig. 
4.  This  is  a  general-purpose  tool  which  should  find  application  in  other  biopolymer  studies 
in  addition  to  our  own.  See  Church  and  Shalloway  (1995,  1996). 


2.  Macrostate  branching  in  distance-geometry  variables 


A  major  difficulty  in  analyzing  biopolymers  is  that  there  are  large  differences  between 
the  effects  of  different  torsion  angles  on  conformation.  Small  changes  in  backbone  angles 
near  the  center  of  a  chain  greatly  affect  conformation  while  changes  in  distal  side-chains 
have  little  effect.  This  results  in  large  matrix  condition  numbers  that  reduce  efficiency.  We 
showed  that  we  could  bypass  this  difficulty  by  describing  the  packets  using  pair-wise  inter¬ 
atomic  distance  variables,  i.e.,  distance-geometry  coordinates.  This  representation  works 
well  because  most  metastable  states  can  be  distinguished  using  only  a  few  distance  variables 
so  it  is  not  necessary  to  consider  all  of  the  0{N2)  variables  (e.g.,  see  Fig.  5).  The  description 
involves  projecting  the  system  probability  density  pa(R)  from  torsion- angle  variables  ©  to 
the  distance  variables  d: 


pH(d)  =  J pa[R(Q )]  |Vdy  [/?(©)]  |  8(d  -  dy [#(©)])  det*[/(0)]  d©  (5) 


where 


T(Q)  =  V'  dflj  dRj 
d6n  ddp  • 


(6) 


The  division  of  a  macrostate  into  child  macrostates  is  readily  identified  algorithmically  by 
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FIG.  4.  Scatter  plots  of  two-dimensional  datasets  having  various  amounts  of  correlation  and 
fluctuation.  The  panels  on  the  left  (b  and  d)  are  shaded  to  show  the  e-2  regions  generated  by 
the  angular  quasiharmonic  distribution;  the  panels  on  the  right  show  the  corresponding  regions 
generated  by  the  multivariate  wrapped-Gaussian  distribution.  The  significantly  improved  cor¬ 
respondence  between  the  scatter  plot  and  the  shaded  region  reflects  the  higher  accuracy  of  the 
wrapped-Gaussian  distribution.  (From  Church  and  Shalloway,  1996.) 
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FIG.  5.  Pairwise  distances  differentiate  two  conformations  of  Met-enkephalin.  (From  Church 
et  al.,  1996.) 
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inspecting  the  p distributions  (e.g.,  see  Fig.  6).  See  Church  et  al.  (1996a)  and  Church  et 
al.  (1998). 

3.  Trajectory  diagram  analysis  of  peptides 

The  methods  described  above  enabled  us  to  begin  testing  the  method  on  the  pentapeptide 
Met-enkephalin  which  has  been  used  as  a  test-case  for  many  theoretical  studies.  Our  goal 
was  to  compute  its  trajectory  diagram  and  to  determine  its  scaling  properties.  The  intrin¬ 
sic  parallelism  of  the  effective-energy  method  provides  an  excellent  opportunity  for  coarse¬ 
grained  parallel  computing.  These  studies  were  conducted  using  both  the  16-processor  Sili¬ 
con  Graphics  Onyx  and  512-processor  IBM  SP2  parallel  computers  available  at  the  Cornell 
Theory  Center.  An  algorithm  was  developed  for  automatically  detecting  the  macrostate 
division  points  and  assigning  processors  to  the  tracing  of  the  low-lying  trajectories.  The 
Met-enkephalin  probability  and  mean-energy  trajectory  diagrams  are  shown  in  Figs.  7  and 
8.  The  probability  trajectory  diagram  (Fig.  7)  displays  the  Met-enkephalin  macrostates  hav¬ 
ing  the  highest  probability  (lowest  free-energy)  at  each  temperature  as  well  as  the  trajectory 
which  leads  to  the  global  energy  minimum.  It  displays  some  features  that  will  be  common 
to  all  biopolymers:  (1)  At  high  temperature  there  is  only  one  macrostate  which  fluctuates 
throughout  all  of  conformation  space  (the  peptide  is  denatured).  (2)  As  temperature  de¬ 
creases,  the  total  probability  is  distributed  amongst  an  increasing  number  of  macrostates. 
(3)  At  very  low  temperatures,  all  of  the  probability  becomes  concentrated  in  the  macrostate 
containing  the  global  energy  minimum.  It  is  important  to  note  that  the  trajectory  that 
leads  to  the  global  minimum  at  300° K  (log  T  ~  —0.22  in  the  units  used  in  Fig.  7)  is  not 
the  highest  probability  trajectory  at  intermediate  temperatures  (e.g.,  —0.1  <  logT  <  0.1), 
indicating  that  Met-enkephalin  has  weak-scaling  in  free-energy.  However,  an  important  dis¬ 
covery  was  that  it  has  strong  scaling  in  mean-energy:  as  seen  in  Fig.  8  the  global  minimum 
trajectory  remains  near  the  bottom  of  the  mean-energy  trajectory  diagram  for  all  temper¬ 
atures.  Furthermore,  there  is  a  gap  between  the  global  minimum  trajectory  and  the  other 
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T  =  17  kcal/mol 


d,N.2CA  (Angstroms) 


dm.2CA  (Angstroms) 


FIG.  6.  Projected  probability  distribution  plN<2CA  at  high  temperature  (a)  and  at  low  temper¬ 
ature  (b).  At  the  low  temperature  the  distribution  satisfies  the  criteria  for  bifurcation  into  two 
macrostates:  one  with  d\ff,2CA  <  d*  and  one  with  d\jj£CA  >  d* .  (From  Church  et  al.,  1996.) 
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trajectories  over  a  significant  temperature  range.  This  means  that  one  can  easily  find  the 
Met-enkephalin  global  minimum  by  just  tracking  a  small  number  of  macrostate  trajectories 
having  the  low  mean-energy. 

We  have  performed  similar  studies  with  other  peptides  to  test  the  generality  of  this 
phenomenon.  Interestingly,  the  mean-energy  trajectory  diagram  of  Leu-enkephalin,  which 
differs  from  Met-enkephalin  by  a  single  amino  acid  substitution,  displays  only  weak  scaling 
and  does  not  have  the  energy  gap.  This  implies  that  global  minimization  of  Leu-enkephalin 
should  be  more  difficult  than  minimization  of  Met-enkephalin.  This  prediction  has  been 
verified  by  comparing  the  simulated  annealing  of  the  two  molecules.  See  Church  et  al. 
(1996)  and  Church  et  al.  (1998). 

Most  recently,  we  have  developed  code  to  implement  the  modified  image  electrostatics 
(MIMEL)7  approach  to  empirical  solvation  and  incorporate  it  into  our  model. 

C.  Dynamical  basis  for  effective-energy  global  optimization 

We  showed  how  the  characteristic  packet  equations  emerge  from  stochastic  analysis  of 
macromolecular  motions  using  the  Smoluchowski  (Fokker-Planck)  equation.  This  leads  to 
a  time-dependent  description  of  the  system  conformational  probability  distribution  that 
is  analogous  to  the  wave-function  description  of  the  Schrodinger  equation.  These  studies 
showed  how  the  Packet  Annealing  Method  is  related  to  the  macromolecular  dynamics  and  led 
to  a  new,  efficient  variational  method  for  computing  transition  rates  between  macromolec¬ 
ular  conformational  states  (see  below).  This  provides  an  important  extension  to  the  global 
minimization  problem  by  allowing  us  to  compute  the  rates  of  folding  and  conformational 
change.  See  Shalloway  (1996). 

1.  Variational  calculation  of  conformational  transition  rates 

At  the  present  time  conformational  reaction  rates  are  generally  approximated  by  tran¬ 
sition  state  and  reactive  flux  methods  (Ref.  8,  for  review).  These  methods  require  the 
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FIG.  7.  Peptide  trajectory  diagrams.  The  high-probability  trajectories  in  the  probability  tra¬ 
jectory  diagram  of  Met-enkephalin.  The  trajectory  which  leads  to  the  global  minimum  (marked 
with  an  x)  is  also  shown.  The  fact  that  its  probability  goes  down  to  ~  10-4  at  log  T  ~  0.1  implies 
that  the  scaling  is  very  weak  in  probability  (or,  equivalently,  free  energy).  (From  Church  and 
Shalloway,  1998.) 
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FIG.  8.  Low  mean-energy  trajectories  for  Met-enkephalin.  The  trajectory  which  leads  to  the 
global  minimum  has  the  lowest,  or  close  to  lowest,  mean-energy  at  all  temperatures — an  example 
of  strong  scaling.  Note  also  the  mean-energy  gap  between  this  and  the  other  trajectories  in  the 
complicated  logT  ~  —0.2  region.  (From  Church  and  Shalloway,  1998.) 
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specification  of  one-dimensional  “reaction  coordinates”  that  can  describe  the  progression  of 
the  system  from  one  conformational  macrostate  to  another.  However,  there  is  no  general 
method  for  identifying  appropriate  reaction  coordinates,  and  it  is  very  difficult,  if  not  impos¬ 
sible,  to  find  them  for  complex  multidimensional  systems  like  macromolecules.  Even  when 
they  can  be  found,  rate  computations  are  exceedingly  expensive  and  often  inaccurate. 

The  Smoluchowski  formulation  mentioned  above  leads  to  a  reaction  path-independent 
method  for  computing  transition  rates  that  can  avoid  these  difficulties.  Based  on  the 
quantum-mechanical  analogy,  we  have  shown  that  transition  rates  can  be  efficiently  de¬ 
termined  by  using  the  Rayleigh- Ritz  variational  principle  to  compute  the  eigenvalues  of  the 
first  excited  states  of  the  Smoluchowski  “hamiltonian” . 

We  have  developed  and  tested  this  variational  method  using  model  potentials  and  the 
argon  microcluster  system  as  test  cases.  Computational  methods  for  iteratively  solving  the 
variational  equations  were  developed  and  tested,  and  showed  that  the  method  was  about  two 
orders-of-magnitude  more  efficient  than  Brownian  dynamics  for  equal  accuracy.  (Ulitsky  and 
Shalloway,  1998).  As  part  of  this  project  we  have  developed  a  new  “contangency”  method 
for  finding  saddle  points  of  effective  energy  landscapes  (Ulitsky  and  Shalloway,  1997)  which 
can  be  used  by  other  transition  rate  computation  methods  as  well. 

D.  Relationship  between  the  Packet  Annealing,  Diffusion  Equation,  and  Gaussian 

Density  Annealing  methods 

Our  effective-energy  method  and  the  competing  Diffusion  Equation9,10  and  Adiabatic 
Gaussian  Density  Annealing11,12  methods  all  use  Gaussian  convolutions  to  smooth  the 
macromolecular  potential.  However,  the  relationship  bvetween  these  methods  has  not  pre¬ 
viously  been  understood.  We  have  now  shown  that  they  are  hierarchically  related:  the 
Diffusion  Equation  Method  is  a  special  case  of  the  Gaussian  Density  Annealing  method 
(restriction  to  isotropic  averaging),  and  the  Gaussian  Density  Annealing  method  is  in  turn 
a  special  case  of  the  Packet  Annealing  Method  (restriction  to  anisotropies  along  fixed  axes, 
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single  packets,  and  high  temperature).  Thus,  the  Packet  Annealing  Method  provides  a 
general  formulation  for  this  entire  class  of  models.  See  Shalloway  (1997). 

E.  Additional  progress 

1.  Spatial  interpolation  integrators  for  molecular  dynamics  simulation 

Molecular  dynamics  simulations  are  much  less  efficient  than  the  effective-energy  tech¬ 
niques  discussed  above  but  are  useful  for  studying  the  details  of  fast  conformational  transi¬ 
tions.  They  are  most  commonly  performed  using  the  Verlet  algorithm  to  integrate  Newton’s 
equation.  However,  this  is  a  general-purpose  integrating  algorithm  which  does  not  exploit 
an  important  special  property  of  Newton’s  equation  for  biopolymers — that  the  force  is  the 
gradient  of  a  scalar  potential.  We  have  shown  that  a  new  class  of  spatial  interpolation  algo¬ 
rithms,  which  do  exploit  this  property,  can  enhance  efficiency  by  factors  of  4-5.  See  Gueron 
and  Shalloway  (1996). 

2.  Benchmark  for  molecular  dynamics  simulations 

In  developing  the  spatial  interpolation  method  we  recognized  that  the  existing  methods 
for  evaluating  the  accuracy  of  molecular  dynamics  simulations  were  inadequate.  While 
energy-conservation  is  often  used,  this  is  a  crude  measure  that  does  not  accurately  reflect 
performance.  To  solve  this  problem  we  developed  a  new  method  which  uses  the  “residual 
force”  VV(i?)  +m  d? R/ dt2  (where  V  is  the  is  the  potential  and  R  is  the  computed  trajectory) 
as  a  measure  of  accuracy.  A  software  package  supporting  this  benchmark  is  being  prepared 
for  general  release.  See  Gans  et  al.  1998. 

3.  Correlation  between  codon  usage  and  protein  secondary  structure 

Biopolymer  production  in  most  cases  involves  expression  of  modified  genes  in  heterol¬ 
ogous  bacterial  or  plant  expression  systems.  Genes  are  constructed  assuming  that  it  does 
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not  matter  which  of  the  synonymous  codons  that  encode  a  specific  amino  acid  are  used. 
While  evidence  indicates  that  this  assumption  is  usually  true,  it  may  not  always  be  true, 
and  specific  synonymous  codon  choices  might  be  important  for  directing  protein  folding. 
This  (controversial)  hypothesis  might  explain  the  inability  of  some  genes  to  be  expressed  in 
heterologous  sources.  We  statistically  analyzed  protein  and  nucleic  acid  databases  to  test 
this  hypothesis  and  found  strong  evidence  for  correlation  between  some  synonymous  codons 
and  protein  structures.  See  Oresic  et  al.,  1998. 
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